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Abstract 


r  / 

In  this  paper t  I  would  like  to  explore  some  difficult  questions  related  to  topics  in 
discourse  analysis  and  offer  a  partial  solution  to  some  of  them.  In  particular,  !  will  address 
the  issue  of  levels  in  discourse  analysis  and  how  the  various  approaches  taken  within  the 
field  can  be  classified  according  to  a  leveled  model.  I  then  want  to  consider  an  approach  I 
have  been  pursuing  for  representing  the  semantics  of  discourse,  and  consider$Khow  it  fits  in 
to  the  proposed  model  for  discourse  analysis.  Finally,  I  describe' .the  implementation  of  a 
system  which  models  the  behavior  of  the  proposed  model.  \  -  *  —  - 1 :  .  -y'v  - 


L  Approaches  to  Discourse  Analysis 


There  has  been  a  great  deal  of  renewed  interest  generated  lately  in  the  area  of 


discourse  analysis,  motivated  in  part  by  the  influence  of  researchers  in  Artificial  Intelligence 


(AI),  attempting  to  design  “natural  language  conversation  systems."  As  with  many  branches 


of  AI,  it  at  times  appears  as  though  we  are  reinventing  the  wheel,  failing  to  take  stock  of 


past  work  done  in  related  disciplines  such  as  linguistics,  philosophy,  and  psychology. 


However,  much  of  the  work  has  added  new  and  complex  dimensions  to  the  study  of 


discourse  analysis  (including  speech  act  theory).  I  am  thinking  in  particular  of  the  works  of 


Allen,  Cohen  and  Perrault  on  the  role  of  planning  in  speech  acts;  Wilks  and  Bien  and  the 


Point  of  View  principle;  and  the  recent  work  done  on  conversational  moves  and  clue  words. 


by  Webber,  Gron,  Sidner,  Rcichman,  and  others.  The  immediate  uniformity  between  these 
approaches  is  that  they  are  concerned  with  process  oriented  models  of  discourse 
understanding  rather  than  claiming  to  being  competence  models. 

1.1  Setting  the  Stage 


In  this  section  I  will  review  what  I  think  is  crucial  to  discourse  analysis  and 
semantics.  In  the  next  section  I  will  survey  the  work  done  in  the  field  and  classify  this 
research  according  to  three  general  approaches.  Then  the  limitations  of  each  of  these 
approaches  will  be  discussed  in  some  detail.  In  the  following  section  I  will  outline  an 
integrated  theory  of  discourse  semantics,  building  on  the  research  discussed  in  the  previous 
sections.  Finally,  in  Section  4  JO,  I  discuss  an  implementation  of  a  program,  CICERO,  which 
embodies  much  of  the  theory  presented  here. 

In  what  follows  I  will  attempt  to  classify  the  different  factors  influencing  the 
“understanding”  of  a  discourse,  and  how  these  have  been  analyzed  and  dealt  with  in  the 
field.  I  will  assume  a  traditional  classification  of  the  communicative  content  of  an 
utterance,  U  f 
(1) 

1.  Truth-conditional  semantics  for  U. 

2.  Entailments  from  U. 


3.  Presuppositions  from  U. 

4.  Conventional  impUcatures  from  V. 


5.  Conversational  impUcatures  from  U. 


1  I  will  follow  Grice's  classification  as  being  essentially  correct.  See  Grice  (1971,  1968,  1969)  for 
further  discussion. 


6.  Felicity  conditions  associated  with  U. 


The  distinction  here  between  entailment  and  presupposition  is,  of  course  the  familiar  one 
(Strawson  (1952)).  “Entailment**  we  will  identify  with  logical  consequence”  and  state 
informally  as: 

(2)  a  semantically  entails  3  iff  every  situation  that  makes  a  true  makes  p  true. 
Presupposition  will  be  defined  as  follows? 

(3)  a  presupposes  p  iff: 

i.  if  a  is  true,  then  p  is  true; 

ii.  if  a  is  false,  then  p  is  true. 

Some  classic  examples  will  illustrate  this  distinction.  Consider  the  sentences  below  in 

(4)  and  (5). 

(4)  a.  All  of  John's  children  are  asleep. 

b.  John  has  children. 

(5)  a.  John  has  stopped  beating  his  wife. 

b.  John  was  beating  his  wife. 

Sentence  (4a)  is  said  to  “semantically  presuppose"  (4b),  but  not  entail  it.  For  if  sentence 
(4b)  is  false  then  we  say  that  (4a)  lacks  a  truth  value. 


1  Strawson's  view  of  presupposition,  of  course,  states  that  this  relation  holds  of  “statements”,  whereas 
some  take  this  to  be  a  relation  between  sentences. 


With  the  sentences  in  (5)  we  see  what  is  called  “pragmatic  presupposition”.  By  the 
use  of  the  aspectual  modifying  verb  'stop'  we  are  eliciting  the  presupposition  in  (5b).  As 
with  the  pair  in  (4),  if  (5b)  is  false,  then  there  is  something  strange  about  (5a)  (and  in 
Strawson's  theory,  this  translates  to  the  lack  of  a  truth-value  for  this  statement). 

There  are  two  other  types  of  pragmatic  presupposdons  that  should  be  mentioned  here. 
One  refers  to  certain  conditions  that  must  be  met  for  a  speech  act  to  be  “felicitous”  and 
appropriate  in  a  specified  situation.  For  example,  (6b)  is  a  reasonable  assumption  or 
presupposition  for  (6a)  (The  example  is  taken  from  Fillmore  (1971)). 

(6)  a.  John  accused  Harry  of  writing  the  letter. 

b.  There  was  something  blameworthy  about  writing  the  letter. 

Finally,  there  is  the  influence  of  the  background  knowledge  (shared  information)  when 
making  an  utterance  in  a  context,  that  can  be  thought  of  as  presuppositional.  Consider  the 
sentences  in  (7). 

(7)  a.  It  wasn't  Mary  that  John  married. 

b.  John  didn't  marry  MARY  (focused). 

c.  John  married  someone,  (and  in  fact  someone  else). 

The  assumed  knowledge  between  the  speaker  and  the  hearer  in  this  case,  (7a)  or  (7b),  is 
the  proposition  in  (7c).  The  presupposition  is  accomplished  by  different  means  in  each 
sentence,  however.  (7a)  seems  to  have  (7c)  as  a  presupposition  because  it  is  in  a  cleft 
construction.  (7b)  has  the  presupposition  in  (7c)  because  it  carries  focus  on  the  object 


5 


position.5 

Having  reviewed  the  types  of  presuppositions,  we  should  note  what  the  role  of 
conventional  implicatures  is  in  discourse  analysis.  These  are  non-truth  conditional  inferences 
that  are  associated  (or  “attached")  to  certain  lexical  items.  For  example,  the  words  'but' 
and  conventional  im plica tu re  that  there  is  a  contrast  of  some  sort  between  the  conjoined 
elements.  When  we  examine  the  work  of  Reichman  and  other  structural  analysts  in  the 
next  section,  the  interpretation  of  clue  words  will  determine  just  such  inferences  for  the 
|  discourse. 

Central  to  Grice's  theory  of  language  use  is  the  notion  of  the  conversational 
implicature.  The  major  “principle"  governing  a  person's  behavior  in  a  discourse  is 
j  formulated  as  follows  (Grice  (1975)). 

(8)  Cooperative  Principle:  Make  your  conversational  contribution  such  as  is  required,  at  the 
|  stage  at  which  it  occurs,  by  the  accepted  purpose  or  direction  of  the  talk-exchange  in 

which  you  are  engaged. 

This  subsumes  the  maxims  of  quantity,  quality,  relation,  and  manner.  We  will  not 

i 

!  discuss  these  in  any  detail  in  this  paper  (but  see  Bach  and  Haraish  (1982)  for  a  clear 

(  exposition  of  their  role  in  discourse). 

Finally,  consider  the  import  of  felicity  conditions  in  the  understanding  of  an  utterance  in 
discourse.  These  are  the  conditions  that  are  required  for  “nondefective"  communication. 
Felicity  conditions  are  to  be  distinguished  from  the  “success  conditions",  which  are  those 

I 

3  The  effect  of  focus  on  presuppositions  has  long  been  known.  Behagei  (1934)  mentions  this  in  the 
context  of  its  relation  to  theme-rheme  structure  of  a  sentence  and  the  discourse  implicatures 
accompanying  it.  This  was  also  noted  by  the  Prague  linguists,  cf.  Hajicova  (1981).  Recently  Rooth 
(1985)  has  examined  the  issues  surrounding  focus  and  presupposition  as  well. 

I 
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condition*  that  are  necessary  and  sufficient  for  the  performance  of  an  act.  In  our 
discussion,  we  will  assume  that  such  conditions  are  necessary,  but  will  have  little  to  say 
about  that  here.4 

Along  with  the  these  semantic  aspects  of  an  utterance,  we  must  include  the  deeper 
coherence  relations  in  a  discourse,  such  as  causal,  temporal,  spatial,  and  definitional 
considerations.  We  will  have  more  to  say  about  these  later. 

It  is  difficult  to  address  one  of  the  areas  above  without  getting  involved  in  at  least 
one  other.  Therefore  no  clearly  delineated  classification  is  possible  for  "who  works  on  which 
topic"  and  just  what  is  meant  by  "semantics.”  Nevertheless,  I  would  like  to  compare  the 
work  done  on  these  topics  by  establishing  what  feeding  relationship  exists  between  them. 

Let  us  begin  by  identifying  what  appear  to  be  the  three  major  approaches  to 


discourse  analysis: 

(9)  a.  Structural  Analysis 

b.  Goal  Recognition 

c.  Model  Theory 

We  turn  immediately  to  the  first  approach  in  the  next  section. 


Aa-S 


1.1  Structural  Analysis 


sV 


$ 


4  But  see  Austin  (1962)  and  Searle  (1969)  for  the  best  discussion  of  this  issue. 
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The  major  concenis  of  those  working  in  this  paradigm  are  to  identify  structural 
elements  such  as  topic,  focus,  discourse  moves,  and  context  spaces.  This  approach  is  primarily 
concerned  with  how  the  structure  of  a  discourse  influences  the  interpretation  as  well  as  the 
linguistic  realizations  of  a  text.  Chief  proponents  of  this  view  are  Grosz,  Webber, 

Reichman,  and  Sidner,  as  well  as  Mann  and  Thompson 

Early  work  by  Webber  (1979)  and  Grosz  (1978,  1981)  was  aimed  at  identifying  the 
contexts  within  which  discourse  anaphora  was  licensed.  The  of  focus  and  topic  was  adopted 
to  delimit  the  space  within  which  anaphoric  binding  is  possible.  That  is,  only  if  something 
is  labeled  with  such  a  discourse  marker  can  certain  pronominal  references  be  licensed. 

As  Reichman  (1964)  puts  it,  the  purpose  of  discourse  analysis  is  to  identify  “a 
conversation's  deep  structure  in  terms  of  the  structural  relations  between  the  discourse 
elements."5  In  this  view  discourse  structure  is  defined  by  the  conversational  moves  (CM) 
taken  by  the  participants  in  the  discourse.  Each  move  takes  the  discourse  into  a  new 
stage;  that  is,  each  move  has  associated  effects.  Also  central  to  this  model  for  discourse 
analysis  is  the  notion  of  context  space,  which  is  an  “abstract  structure”  taking  into  account 
the  following  components: 

(10) 

1.  The  propositional  representation  of  the  discourse  utterance. 

2.  The  conversational  move  (CM). 

3.  The  Preconditions  for  the  move 

4.  Links  to  previous  disourse  spaces. 

3.  Focus  level  assignments  for  various  elements  in  the  context  space. 

5  Cf.  Reichman  (1984)  for  a  full  discussion. 


V  s.  ■  -  .  *\  <*-  -  .  ’  . 
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According  to  Reichman's  view,  all  discourse  utterances  obey  certain  rules,  regardless 


of  the  type  of  discourse.  A  few  of  the  more  important  ones  are  given  below. 

(U) 

1.  Conversation  is  a  series  of  moves  linked  by  functional  relations. 

2.  Utterances  in  a  single  context  space  serve  the  same  move. 

3.  A  move  has  preconditions  and  effects  associated  with  the  underlying  discourse 
structure. 

4.  While  in  a  subspace,  the  containing  context  space  retains  control. 

5.  Inter-sentential  anaphoric  binding  is  possble  only  with  high  focus  items. 

Central  to  this  model  of  discourse  analysis  is  the  belief  that  conversational  moves 
(moves)  are  recoverable  from  the  specific  linguistic  structure  of  the  text.  Thus,  we  have  a 
taxonomy  of  possible  moves  and  the  due  words  most  frequently  associated  with  them: 


(12) 

MOVE  CLUE  WORD 


1.  support 

2.  restatement  and/or  conclusion 
of  point  supported 

3.  Interruption 

4.  Return  to  interrupted  space 
3.  Indirect  challenge 

6.  Direct  challenge 

7.  Subargument  concession 

8.  Prior  logical  abstraction 

9.  Further  development 


Because;  Like 
So 

By  the  way 
Anyway 
Yes,  but 
(No)  but 
All  right 
But  look 
Now 


The  “deep  structure”  of  a  discourse  consists  of  a  sequence  of  the  above  moves, 
through  which  a  conventional  interpretation  (the  understanding  of  the  discourse)  is 
accomplished.  This  essentially  involves  recovering  the  mutual  knowledge  between  the 
participants  in  the  discourse. 


Also  very  conscious  of  the  role  that  discourse  segments  and  clue  words  play  in  the 
proper  analysis  of  discourse  is  the  recent  work  by  Grosz  and  Sidner  (1986).  They  propose  a 
model  of  discourse  structure  with  three  interacting  components: 

1.  A  linguistic  structure:  the  utterance  itself. 

2.  An  intentional  structure;  and 

3.  An  attentional  state:  an  abstraction  of  the  focus  of  attention  of  the  discourse 
participants. 

Central  to  their  model  is  the  notion  of  a  Discourse  Purpose  (DP),  which  is  the  “intention 
that  underlies  engaging  in  a  discourse.”  There  is  one  discourse  segment  purpose  for  each 
discourse  segment.  Furthermore,  the  process  of  manipulating  focus  spaces,  referred  to  as 
focusing,  combines  with  the  DP  to  control  the  emerging  discourse.* 


*  Grosz  and  Sidner's  (1986)  paper  became  available  to  me  much  too  late  to  critique  and  review 
thoroughly,  so  I  undoubtably  do  it  an  injustice  here.  Cf.  Pustcjovsky  (in  preparation)  for  a  closer 
analysis  of  this  work  and  the  relevance  to  our  model  presented  here. 
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1.2  Goal  Recognition 

A  very  different  approach  to  discourse  analysis  is  that  which  I  will  call  Goal 
Recognition.  This  differs  significantly  from  the  structural  analysis  school  in  one  important 
respect:  what  is  being  recovered  from  an  utterance  and  what  is  being  represented  as  the 
understanding  of  the  discourse  (or  text)  is  something  much  deeper  than  the  structural  form 
of  the  text.  Within  this  approach  we  can  single  out  two  major  schools  of  thought:  those 
concerned  with  narrative  form,  coherence,  and  story  understanding  (Schank,  Abelson,  Hobbs, 
and  Wilensky);  and  those  concerned  with  the  recognition  of  speech  acts  and  intentions 
(Cohen,  Allen,  and  Perrault). 

For  Schank  and  Abelson  (1977),  and  much  of  the  Yale  school,  undemanding  a  text 
is  a  problem  of  inference  generation  and  control.  That  is,  a  reader  attempts  to  find  the 
implicit  connections  between  the  sentences  in  the  text.  As  a  solution  to  the  infinite  search 
space  problem  of  inferences,  they  proposed  that  there  are  script-like  knowledge  structures 
which  we  can  access  in  order  to  understand  stories.  Thus  we  recover  these  prototypical 
event-sequences,  the  scripts,  and  form  a  coherent  undemanding  of  the  text. 

Wilensky  (1982)  points  out  a  number  of  problems  with  this  approach,  chief  among 
them  the  fact  that  not  all  stories  or  texts  can  be  characterized  as  stereotypical  sequences  of 
events.  He  proposes  a  theory  of  text  coherence  that  incorporates  the  goals  and  plans  that 
actors  in  a  text  may  have.  Thus,  we  try  to  recognize  what  the  intention  of  the  actor  is 
and  piece  together  the  text  on  the  basis  of  this  goal. 

Whereas  Wilensky  is  concerned  more  with  the  underlying  intentions  and  goals  of  the 
agents  in  a  text,  Hobbs  (1978,  1982)  attempts  a  general  classification  of  coherence  relations 
that  may  exist  in  a  text.  The  two  that  he  examines  in  detail  (Hobbs  1982)  are  elaboration 
and  occasion.  These  relations  are  formal  constraints  on  an  inference  mechanism  which 


construct*  a  tree-like  structure  for  a  discourse  containing  all  Me  asserted  and  presupposed 
propositions  (cf.  Hobbs  (1980)). 

Lehnert  (1978,  1982)  is  also  critical  of  the  purely  script-based  and  story  grammar 
approach  to  understanding  as  being  too  top-down  oriented.  She  proposes  a  system  of  text 
analysis  and  memory  organization  which  has  the  features  of  bottom-up  processing  as  well. 

In  this  theory  the  underlying  notion  of  coherence  is  based  on  affect  states  and  plot 
units.  Affect  states  are  a  set  of  primitive  predicates  over  states  and  events,  with  values 
positive,  negative,  or  neutral.  That  is,  an  event  is  positive,  etc.  with  respect  to  an  object. 
These  states  are  bound  to  objects. 

In  addition  to  these  primitive  predicates  are  links  between  event/state  pairs  that 
describe  causal  coherence  relations.  These  are:  motivation,  activation,  termination,  and 
equivalence.  From  these  notions  Lehnert  then  defines  the  notion  of  plot  unit:  a  plot  unit  is 
a  directed  labeled  link  from  one  affect  state  value  to  another.  The  underlying  coherence  of 
a  narrative,  then,  is  captured  in  terms  of  these  units. 

It  is  important  to  note  that  for  these  approaches,  the  inference  processes  are  spawned 
as  a  result  of  the  knowledge  structures  associated  with  propositions  (and  the  plans  they  fall 
into)  rather  than  linguistic  or  surface  structural  dues.7 

Alterman  (1985)  proposes  an  interesting  theory  of  text  coherence  based  on  the  notion 
of  event  concept  coherence.  This  property  is  part  of  the  dictionary  entry  for  an  event/state 
description,  and  provides  a  way  to  group  text  into  structured  bundles,  based  on  their 
relative  coherence.  Alterman  makes  three  claims  for  this  theory:  (1)  text  is  composed  of 
structured  chunks  of  conceptual  event/state  descriptions;  (2)  events  can  be  bundled  together 


7  This  is  not  completely  true,  of  course.  Some  researchers  in  this  school  make  use  of  clue  words  just 
as  Reichman  and  Polanyi  and  Sc  ha. 


without  stating  their  complete  causal  connections;  (3)  the  initial  grouping  and  structuring  of 
text  can  be  done  with  simple  augmentation  of  case  relationships  by  inter-event  relations. 

The  concept  coherence  relations  assumed  by  Alterman  are  characterized  as  follows: 

(13)  a.  Taxonomic-class/subclass 

b.  Partonomk 

i.  sequence/subsequence 

ii.  coordinate 

c.  Temporal 

i.  before 

ii.  after 

Thus  in  an  example  such  as  (13),  it  is  the  relative  proximity  of  the  concepts  chop  and  drop 
via  the  concept  hold  that  establishes  the  coherence  between  the  two  sentences. 

(13)  a.  The  peasant  was  chopping  a  tree  in  the  woods, 
b.  He  dropped  his  axe... 

A.pp  Another  approach  that  addresses  questions  of  goal  recognition  is  taken  by  Cohen, 
Alien,  and  Perrault.  1  These  researchers  have  as  their  primary  concern  the  recognition  and 
modeling  of  the  speaker's  plans  in  a  dialogue.  According  to  this  view,  speakers'  intentions 
can  be  thought  of  as  plans,  and  speech  acts  are  no  different  from  any  other  actions. 
Hence,  they  can  be  planned  and  recognized  with  algorithms  and  heuristics  already  employed 
in  AI  for  planning  systems  (e.g.  STRIPS). 

*  The  work  ot  Groce  (1978)  deals  with  tracking  a  dialogue  topic  in  a  task-oriented  domain.  She 
employed  plan-tracking  heuristics  to  this  end,  but  did  not  embed  speech  acts  into  a  general 
planning  environment. 


Following  Cohen  and  Perrault  (1979),  this  approach  treats  actions  as  operators  defined 
in  terms  of  preconditions  (applicability  conditions),  effects ,  and  bodies,  which  explicate  how 
to  achieve  the  effects.  These  are  evaluated  relative  to  the  speaker's  models  of  their 
listeners.  Thus  discourse  processing  in  this  view  has  nothing  to  do  with  the  structure  of 
the  discourse  per  se  but  rather  with  the  intentions  of  the  speakers.’ 

The  model  that  a  speaker  has  of  his  listeners  involves  representing  the  beliefs  and 
goals  of  those  people.  Belief  is  interpreted  for  Cohen  and  Perrault  as  a  modal  operator, 
A-BELIEVE,  taking  propositions  as  its  argument.  This  formal  treatment  (cf.  Hintikka 
(1969))  allows  for  infinite  embeddings  of  belief  contexts,  with  the  advantages  and  problems 
of  such  an  approach.* 

Recently  Litman  and  Allen  (1984)  have  extended  the  planning  paradigm  to  allow 
plans  about  the  planning  process  itself.  This  allows  for  tracking  clarification  subdialogues 
while  still  keeping  track  of  the  plans  associated  with  the  speech  act  being  performed. 

Finally,  another  important  approach  to  belief  (and  goal)  recognition  is  that  taken  by 
Wilks  and  Bien  (1983).  This  ’least-effort” 

approach  to  language  understanding  and  belief  representation  is  to  be  contrasted  with  that 
just  mentioned,  such  as  Allen  and  Perrault  (1980).  Wilks  and  Bien  argue  that  deep  nestings 
of  beliefs  could  not  possibly  be  efficient  from  a  psychological  or  computational  perfective. 
They  propose  as  an  alternative  a  theory  of  belief  percolation,  whereby  temporary  frames 
(pseudo-texts)  indicating  belief  states  can  be  pushed  down  into  another  such  frame,  if 

’  Recently  Litman  and  Allen  (1984)  have  proposed  a  model  of  plan  recognition  that  does  incorporate 
some  of  the  strategies  found  in  the  structural  analysis  school.  We  will  return  to  this  theory  below. 

*  For  discussion  of  this  topic,  cf.  Cohen  and  Perrault  (1979). 
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13  Model  Theory 

Recently  there  has  been  much  work  done  on  discourse  within  formal  approaches  to 
linguistics  and  semantics.  I  am  thinking  in  particular  of  the  Discourse  Representation 
Theories  of  Kamp  (1981)  and  Heim  (1982)  and  the  recent  work  on  Situation  Semantics  by 
Barwise  and  Perry  (1983).  These  approaches  take  (at  least  In  spirit)  as  their  point  of 
departure  the  formal  framework  proposed  by  Montague(1974)  and  Kaplan's  work  on 
indexicals  and  demonstratives  (Kaplan  1977).  There  isn't  room  here  to  examine  these  works 
in  detail,  but  I  will  review  the  major  points  of  their  theories. 

Kamp's  (1981)  main  concern  is  the  correct  interpretation  and  representation  of 
discourse  referents.  Essentially,  Kamp  argues  that  deictic  and  anaphoric  occurrences  of 
pronouns  are  identical,  and  that  identifying  their  antecedents  involves  selection  from 
specified  sets  of  previously  available  entities.  Associated  with  an  utterance  is  a  discourse 
representation  structure  (DRS)  containing  the  appropriate  quantification  over  the  entities  in 
the  proposition,  as  well  as  the  propositional  content.  To  illustrate,  consider  the  DRS  for 
(14a),  shown  in  (15): 


(14)  a.  Pedro  owns  a  donkey, 
b.  He  beats  it. 


(15) 


u  v 

Pedro  owns  a  donkey 
u  =  Pedro 
u  owns  a  donkey 
donkey(v) 
u  owns  v 


Now,  the  novel  aspect  of  Kamp's  proposal  comes  with  the  DRS  for  (14b).  Because  there 


axe  no  potable  referents  within  (14b)  for  the  two  pronouns,  it  does  not  license  a  separate 
DRS  but  must  rather  be  embedded  within  (or  bound  by)  another,  satisfying  structure;  in 

this  case,  (IS).  Hence  we  have  the  DRS  for  the  discourse  pair,  shown  in  (16). 

/ 

(16)  u  v 

Pedro  owns  a  doneky 
u  “  Pedro 
u  owns  a  donkey 
donkey(v) 
u  owns  v 
He  beats  it 
u  beats  it 
u  beats  v 

The  proper  linking  is  now  possible  between  the  pronominal!  and  their  antecedents,  since 
their  is  a  common  scope  delimiter,  viz.  the  DRS,  which  contains  both  binder  and  variable. 

Heim's  (1982)  approach  is  similar  to  Kamp's  in  many  respects,  but  her  concerns  is 
how  to  represent  the  presuppositions  carried  by  utterances.  Crucial  to  this  theory  is  the 
notion  of  a  file ,  a  record  on  which  descriptions  of  entities  can  be  kept,  and  which  is 
evaluated  with  respect  to  rules  of  familiarity  and  file-change. 

According  to  Heim,  every  sentence  has  Tile  change  potential.  That  is,  every  utterance 
has  the  potential  to  change  the  context  set  of  the  utterances  following  it.  The  common 
ground,  in  Stalnaker's  (1979)  terms,  between  the  speaker  and  the  bearer,  is  the  set  of 
presuppositions  common  to  both.  This  is  what  is  contained  in  the  file  of  a  context  in 
Heim's  theory.  Another  view  that  should  be  mentioned  here  is  Lauri  Carlson's  game 
theory  of  discourse  (Carlson  (1983)).  Space  does  not  permit  us  to  examine  it  here.  However, 


we  do  discuss  some  of  his  ideas  in  Section  3.0. 


2.  Shortcomings  of  the  Current  Approaches  to  discourse  analysis 


It  is  clear  from  our  discussion  above  that  what  counts  as  a  representation  of  the 
discourse  or  as  the  “understanding”  of  the  text  differs  wildly.  In  this  section  I  would  like 
to  explore  how  these  different  representations  interact  and  propose  a  model  for  discourse 
analysis  incorporating  these  component  parts. 

2.1  Conversational  Moves  versus  Coherence 

Let  us  begin  by  examining  the  logical  distinction  between  possible  conversational 
moves  in  a  discourse  and  possible  types  of  coherence  that  tie  a  text  together.  Reichman 
and  others,  following  Grice  (1971),  classify  utterances  according  to  the  roles  they  play  in  the 
discourse;,  eg.  supporting,  elaborating,  interrupting,  etc.  Others  working  in  goal  recognition 
have  classified  the  types  of  coherence  relations  that  exist  between  sentences  in  a  text  or 
discourse.  These  include  causation,  temporal  ordering,  but  also  notions  such  as  elaboration 
and  occasion.  The  problem  here  is  that  what  some  are  calling  moves  in  a  discourse  others 
term  coherence  relations. 

Hobbs  (1982),  for  example,  describes  the  two  coherence  relations,  elaboration  and 
occasion.  In  the  dialogue  shown  in  (17),  (b)  is  said  to  elaborate  (a). 

(17)  a.  John  can  open  Bill's  safe, 
b.  He  knows  the  combination. 

Similarly  (18)  is  said  to  be  an  instance  of  an  elaboration. 

(18)  a.  Go  down  Washington  St. 

b.  Just  follow  Washington  St  three  Mocks  to  Adams  St. 

Although  the  (b)  examples  above  dearly  elaborate  the  (a)  sentences,  there  is  much  more 
that  can  be  said  about  the  coherence  rations  between  them  than  this.  The  notion  of 


elaboration  Hobbs  is  using  here  is  structural  coherence  and  is  not  significantly  different 
from  a  conversational  move  for  the  structural  analysts.  In  this  sense  I  agree  that  both  (17) 
and  (18)  are  structural  elaborations. 

A  deeper  description,  however,  of  the  connectedness  between  the  two  sentences  in 
(17)  would  involve  something  like  a  because-of  relation;  that  is,  the  real  coherence  link  here 
is  enablement  and  not  elaboration.  The  connectedness  between  (18a)  and  (18b),  on  the  other 
hand,  is  one  of  identity.  Although  structurally  an  elaboration,  (18b)  reflects  a  changed 
performative  strategy  by  the  speaker,  due  to  his/her  model  of  the  hearer's  beliefs. 

The  other  coherence  relation  Hobbs  mentions  is  occasion ,  which  can  be  defined  simply 
as  follows:  A  occasions  B  if  A  creates  a  state  so  that  B  can  occur.  An  example  of  this  is 
a  text  involving  direction  giving: 

(19)  a.  Turn  left. 

b.  Go  to  the  comer. 

By  performing  the  action  denoted  in  (19a)  a  change  of  location  is  effected  that  allows  the 
action  in  (19b)  to  occur.  The  structural  relationship  between  (a)  and  (b)  is  simply  a 
continuation  or  further  development,  and  I  agree  with  Hobbs  that  the  coherence  link  here 
is  one  of  occasioning. 

While  Hobbs  and  others  fail  to  make  a  careful  distinction  between  conversational 
moves  and  deeper  coherence  relations,  still  others  ignore  the  role  of  discourse  moves 
entirely.  Alterman  (1985),  for  example,  develops  a  taxonomy  of  concept  coherence  terms 
with  which  his  system  creates  a  complete  representation  of  a  narrative  text  without  recourse 
to  textual  moves  or  moves.  The  obvious  problem  with  this  approach,  in  my  opinion,  is  that 
without  the  structural  clues  provided  by  a  discourse  or  text  (such  as  topic  and  focus)  it  is 
impossible  to  adequately  recover  the  interpretation  of  pronouns  and  deictic  terms.  For 


example,  in  the  partial  text  mentioned  in  section  1  (cf.  (18)),  he  is  bound  by  the  NP 
mentioned  in  the  previous  sentence,  the  peasant.  But  it  is  not  the  underlying  coherence 
relation  that  licenses  this  as  much  as  the  structural  positioning  of  the  antecedent  relative  to 
the  pronoun. 

Determining  such  structural  environments  for  discourse  anaphora  has  been  the 
concerns  of  researchers  such  as  Sidner,  Grosz,  Webber,  and  Reichman.  One  such  licensing 
context  is  the  domain  of  focus,  which  accounts  for  the  anaphoric  behavior  of  the  pronoun 
discussed  in  the  previous  paragraph.  These  theories  suffer,  however,  from  the  lack  of  any 
coherent  representation  of  the  deeper  semantic  relations  between  the  discourse  entities. 

As  discussed  above,  Reichman  proposes  a  theory  of  discourse  structure  based  on 
conversational  moves.  Clue  words  act  to  signal  when  a  shift  in  context  is  being  made. 
This  model  takes  a  surface  representation  (call  it  SS)  and  maps  it  into  a  discourse 
representation  (DR)  using  these  clue  words  as  triggers  for  interpretation.  Thus,  an  utterance 
such  as  (20b)  is  construed  as  a  support  far  (20a). 

(20)  a.  I  don't  like  John,  (b)  because  he's  rich. 

Let  the  interpretation  of  (20a)  be  represented  by  P,  and  (20b)  by  Q.  The  derived  DR  for 
this  pair  is  then, 

(21)  P  because  Q  -  tuppom(Q,P) 

Interestingly,  however,  there  is  another  interpretation  of  (20)  with  the  because  connective 
(operator)  inside  the  scope  of  the  negative  in  (20a).  The  reading  here  can  be  paraphrased 
as,  “It  is  not 

the  case  that  (P  because  of  Q),  but  (P  because  of  O')  ”  The  function  of  because  under 
this  reading  is  not  direct  support,  but  rather  to  trigger  au  entirely  different  set  of 
presuppositions.  Namely,  the  fact  that  there  is  some  other  support  to  P  that  is  not  explicitly 


mentioned,  and  that  Q  does  not  support  P. 


This  points  to  the  problem  of  what  to  take  as  the  input  to  discourse  analysis. 
Reichman  assumes  that  surface  structure  is  the  natural  choice,  as  do  most  structural 
analysts.  This  example,  however,  seems  to  indicate  that  Logical  Form  (LF)  may  have  a 
feeding  role  into  Discourse  Representation.  Any  presuppositions  or  discourse  moves 
associated  with  the  second  interpretation  would  have  to  be  derived  from  the  LF,  where  the 
appropriate  scope  assignments  are  represented  (cf.  (22)). 

(22)  [P  because  Q]  -  supports(Q',P) 

Although  this  is  an  isolated  example,  I  think  it  is  important  to  study  such  interactions  in 
order  to  establish  the  feeding  relations  between  the  various  interpretation  levels. 

Another  criticism  that  can  be  leveled  at  Reichman  concerns  her  misunderstanding  of 
the  Toronto  school's  (Allen,  Cohen,  Perrault)  meaning  of  "understanding.”  She  points  out 
that  one  mus  t  distinguish  between  a  person's  intention  for  an  utterance  and  the 
communicative  effect  of  the  utterance  in  context:  "[While]  a  speaker's  intent  may  well  be 
reflected  by  a  communication,  grasping  that  intent  cannot  be  a  necessary  precondition  for 
understanding. "(Reichman  (1964)).  The  confusion  here  is  this:  Reichman  states  that  a 
hearer's  interpretation  is  dependent  on  the  communicative  effect  of  the  utterance  in  context, 
and  this  may  or  may  not  be  identical  to  the  speaker's  intent.  I  agree  with  this,  but  I 
would  not  call  this  understanding  the  speaker.  This  is  in  fact  the  basis  lot  misunderstanding 
in  a  communicative  act.  In  order  to  fully  understand  the  speaker,  it  is  not  a  sufficient 
condition,  but  at  least  a  necessary  condition  to  recover  the  intent. 


Finally  the  question  arises  as  to  where  the  model  theoreticians  fit  into  the  discussion 
above.  First,  it  is  obvious  that  the  major  concerns  are  different  for  these  researchers. 
Although  questions  of  anaphora  and  reference  are  dealt  with,  Kamp's  theory  doesn't 
addrea  the  problems  of  inferendng  or  goal  recognition  and  planning.  Nor  does  he  look  at 
the  structure  or  semantics  of  meta-sentential  text  and  ask  questions  pertaining  to  coherence. 
Yet  these  are  not  his  immediate  interest.  Heim  addresses  many  topics  related  to  Discourse 
Analysis  as  well,  the  emphasis  being  on  the  presuppositions  from  utterances,  and  the 
proper  characterization  of  the  common  ground,  the  mutual  belief  space.  Although  this  work 
highlights  the  importance  of  LF  for  later  interpretation  strategies,  her  concerns  do  not 
extend  to  the  deep  coherence  relations  addressed  by  Hobbs  and  others. 


3.0  Levels  of  Discourse  Analysis 


In  this  section  I  would  like  to  outline  a  model  for  discourse  analysis  based  on  fairly 
strict  levels  of  interpretation  and  establish  what  the  relationships  are  among  the  different 
components.  In  our  discussion  we  will  address  the  following  questions: 

1.  What  are  the  levels  of  analysis  for  Discourse  Analysis? 

2.  What  is  the  unit  of  analysis  for  Discourse  Analysis? 

3.  How  does  Discourse  Representation  (DR)  affect  interpretation? 

4.  If  DR  is  not  the  final  semantic  interpretation,  then  what  is? 

Although  this  model  is  obviously  incomplete  in  the  form  outlined  below,  we  claim  to  offer 
a  new  perspective  which  can  contribute  to  the  solution  of  some  long-standing  problems.  We 
should  also  note  that  this  is  a  proposal  for  a  process-oriented  model  rather  than  a 
competence  model  (but  I  will  not  discuss  this  distinction  here). 

Let  us  begin  by  separating  the  structural  or  syntactic  aspects  of  a  discourse  from  the 
coherence  relations,  which  we  will  call  the  “deeper  semantics”.  First,  it  should  be  clear  that 
the  conversational  moves  discussed  above  in  Section  1.1  are  structural  descriptions  for  the 
constituency  of  the  discourse  itself. 

We  will  view  a  conversational  move,  following  Carlson  (1983),  as  involving  the 
following  parameters. 

1.  The  author  of  the  move. 

2.  The  addressees)  of  the  move. 

3.  The  audience  of  the  move. 


4.  The  sentence  of  the  move. 


5.  The  game  rule(s)  which  justify  the  move. 

6.  The  premises  of  the  move. 

7.  The  dialogue(s)  the  move  is  in. 

Perhaps  most  relevant  to  our  discussion  is  the  structure  admissability  which  point  (5) 
addresses.  That  is,  a  move  is  justified  in  the  context  of  a  larger  structural  unit,  referred  to 
as  a  game  in  Carlson's  framework.  We  will  return  to  this  point  later  in  our  discussion  of 
discourse  syntax. 

Also  of  a  structural  nature  are  the  “domain”  notions,  such  as  focus  and  topic,  which  have 
meaning  (for  interpretation  purposes)  only  within  a  context,  in.  a  discourse.  Constraints  on 
the  interpretation  of  anaphora  and  debris  are  definable  in  terms  of  these  notions. 

Similarly,  textual  “directives”,  such  as  “elaboration',  are  syntactic  rather  than 
semantic  in  their  function,  since  an  elaboration  of  an  expression  may  denote  any  number  of 
semantic  connections,  from  causation,  non-causal  explanation,  to  simple  description.  Thus,  a 
textual  directive  (or  cohesion  relation)  establishes  a  certain  “inferential’’  connectedness 
without  fully  specifying  what  it  is  (cf.  the  comparison  with  coherence  relations  below). 

We  thus  arrive  at  the  following  possible  structural  relations  in  a  discourse: 

1.  A  conversational  move  (CM);  e.g.  support,  interrupt,  challenge,  etc. 

2.  A  term  that  acts  to  delimit  the  evaluation  of  a  discourse  object, 
e.g.  topic,  focus,  theme,  rheme. 

3.  a  cohesion  relation,  e.g.  elaboration. 


While  it  is  impossible  to  characterize  all  discourses  in  terms  of  a  set  of  common 
structural  properties,  there  may  be  semantic  similarities  that  all  dialogue  situations  have  (Cf. 
Carlson  (1983)  for  such  a  view).  Yet  there  are  some  text  situations  that  lend  themselves  to 
a  fairly  straightforward  analysis.  These  are  the  simple  monologue  structures,  discourses 
involving  one  participant. 

We  can  characterize  the  complexity  of  a  discourse  by  the  possible  turns  available  at 
any  stage  in  the  dialogue.  Single  participant  speaking  situations,  then,  will  have  fewer  turns 
available  at  any  state  than  those  with  two,  and  so  on.  The  simplest  structure  in  this  view 
will  then  be  a  directed  monologue,  where  the  goal  of  the  speaker  is  brought  about  by  the 
manner  in  which  the  discourse  is  structured.11 

To  say  that  directed  monologues  is  the  simplest  discourse  type  is  not  to  say  that  they 
lack  complexity.  Within  this  family  of  discourses  we  can  distinguish  several  basic  types, 
some  still  simpler  than  others: 

Directed  Monologue  Types 

1.  Enumeration. 

2.  Elaboration. 

3.  Definition. 

4.  Description. 

5.  Proof-form. 

6.  Narrative. 

11  Other  types  of  text  and  discourse  will  also  meet  this  criterion,  of  course.  For  example,  rhetorical 
argumentation,  dialectic  discussion,  and  other  dialogues,  achieve  the  goal  of  the  participant  by  the 
structure  of  the  discourse  itelf.  We  do  not  have  the  space  to  discuss  these  here,  however. 


At  an  example  of  an  enumeration,  consider  both  texts  below. 

The  reasons  we  should  hire  John  are  as  follows:  A,  B,  C,  ... 

There  are  several  reasons  for  hiring  John.  First  A,  secondly  B,  ... 

/ 

An  elaboration  monologue  is  a  textual  directive  of  “elaborate*  for  a  larger  text.  For 
example: 

(a)  John  can  open  BilTs  safe. 

(b)  He  has  the  combination, 

(c)  which  he  got  from  Mary. 

There  are  actually  two  types  of  elaboration  in  this  example.  The  relationship  between  (a) 
and  (b)  is  an  explanatory  elaboration  while  that  between  (b)  and  (c)  is  a  descriptive 
elaboration.12 

Given  the  relative  simplicity  of  directed  monologues,  we  will  suggest  that  there  are 
useful  structural  generalizations  that  can  be  made  about  their  form;  namely,  that  a  directed 
monologue  is  defined  structurally  as  a  text  where  one  proposition  acts  as  head,  H,  and  at 
least  one  which  acts  as  its  complement,  COMP.  u  Any  other  material  in  such  a  monologue 
can  be  analyzed  as  adjunct  text.  We  say  that  the  text  is  a  projection  of  its  head.  Thus,  a 
directed  monologue,  M,  has  the  following  minimal  structure. 

°  In  a  tense,  however,  they  are  both  explanatory,  since  the  latter  explains  how  John  has  the 
combination.  The  nature  of  explanation,  however,  is  a  rich  and  complex  topic  in  its  own  right, 
and  well  beyond  the  scope  of  our  rather  suggestive  discussion  here. 

°  We  borrow  these  structural  notions  from  linguistic  theory.  Cf.  Chomsky  (1981)  for  further 
explanation. 


M  -  {....H,  COMP,  ...} 


The  specific  type  of  text  will  specify  further  syntactic  properties,  for  example,  the  position 
of  the  head,  and  the  number  of  complements,  etc.  To  make  this  dearer,  consider  the  text 
structure  of  an  enumeration. 

Menum  -  Head  COMP 
COMP  -  p/  p j  ...  Vn 

The  only  structural  commitment  bang  made  here  is  that  the  listing  acts  as  a  unit, 
independent  of  the  head,  or  theme.  This  approach  differs,  then,  from  “systemic” 
classifications  of  text  structure,  in  that  we  attribute  to  the  text  only  a  minimal  structure, 
absent  of  any  powerful  functional  labels.14 

Moving  on  to  discourses  involving  two  participants  we  quickly  see  the  limitations  of 
our  syntactic  modd.  Such  texts  are  simply  too  rich  to  tend  themselves  to  such  linear  and 
single  goal  grammars.  We  will  discuss  the  nature  of  more  complex  discourse  in  a  later 
work  (Pustejovsky  (in  preparation)).  In  the  next  section,  however,  we  will  idealize  the  data 
in  a  two-participant  text,  and  attempt  a  generalization  along  lines  similar  to  those  outlined 
above. 

We  now  consider  the  more  interpretive  aspects  of  discourse  structure.  The  coherence 
relations  we  discussed  earlier  in  section  12.  are  less  structural  in  nature,  although  sometime 
they  are  related  to  specific  structural  realizations.  Relations  such  as  enablement,  causation 
in  general,  or  explanation  are  not  uniquely  or  deterministically  inferrable  from  the  linguistic 

14  Cf.  van  Dijk  (1980)  for  text  classification  approaches.  Also,  Mann  (1984)  and  Mann  and  Thompson 
(1983)  for  a  systemic  analysis  of  text  structure,  and  a  very  nice  survey  of  some  of  the  approaches 
taken  to  this  problem. 


or  discourse  structure  alone.  The  structure  of  a  dialogue  can  be  characterized  independently 
of  coherence  relations,  but  not  of  the  cohesion  relations  and  moves. 

The  major  notion  contributing  to  the  semantics  of  an  utterance  in  a  given  context  is 
the  intention  of  the  speaker  in  performing  that  particular  act.  We  will  term  this,  somewhat 
casually,  the  “speaker's  goal”.  This  might  be  compared  to  Grosz  and  Sidner's  Discourse 
Purpose,  and  we  trill  discuss  this  similarity  later  in  the  paper. 

Let  us  now  attempt  to  organize  these  various  contributing  factors  into  a  model  for 
discourse  analysis  (DA).  In  the  previous  section  we  suggested  that  perhaps  Logical  Form 
(LF)  is  the  appropriate  input  for  discourse  analysis  rather  than  the  surface  structure  (SS) 

itself.  We  will  continue  with  that  assumption  here. 

/ 

We  will  assume  that  any  adequate  model  of  discourse  analysis  should  represent  the 
distinctions  between  the  structural  properties  and  the  semantic  properties  of  the  discourse. 
We  will  claim  that  the  former  should  be  viewed  as  comprising  a  level  of  Discourse 

Representation  (DR)  distinct  from  the  purely  syntactic  or  Semitic  interpretation  of  the 
utterance.  Let  us  then  propose  the  following  hypothesis  as  the  first  link  in  the  model: 

(23)  LF  -  DR 

That  is,  the  Logical  Form  of  an  utterance  is  seen  as  feeding  Discourse  Representation 

somehow. 

Establishing  such  a  model,  however,  is  meaningless  without  examining  what  the  unit 
of  analysis  for  discourse  analysis  is.  We  will  assume  that  the  utterance,  as  defined  by 
linguists,  is  the  unit  for  analysis.  One  utterance  may  have  several  communicative  effects, 
however,  in  terms  of  conversational  moves  and  the  speech  acts  conveyed.  If  DR  is  the 

level  at  which  moves  and  directives  are  represented  io  our  model,  then  the  mapping  from 

LF  to  DR  is  not  one-to-one,  but  rather  one-to-many.  For  example,  any  n on-restrictive 
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relative  clause  can  be  thought  of  as  (at  least)  an  elaboration  or  further  development  of  the 
NP  it  modifies.  Yet  for  purposes  of  intra-sentential  anaphora  and  binding,  we  must  treat  it 
as  one  sentence.15  Similarly,  adjunct  clauses  containing  temporal  adverbial!  and  other 
connectives  may  very  often  signal  a  conversational  move  on  the  part  of  the  speaker,  and 
hence  will  map  to  a  separate  sub-representation. 

In  order  to  capture  this  mapping  let  us  say  that  one  of  the  primitives  at  DR  is  the 
clause ,  i.e.  a  simple  proposition.  The  syntax  of  DR  establishes  the  connectedness  of  these 
clauses  in  terms  of  the  moves  taken  by  the  speaker  (or  inherent  in  the  text).  We  express 
this  as  follows,  where  CF  is  the  abbreviation  for  Clausal  Form: 

(24)  LF  -  {0,CF,} 

The  CF  for  the  sentence  “John  loves  Mary**,  for  example,  would  be  the  standard  logical 
representation  “TNS(loves(j,m))”,  just  as  the  NP  “every  woman”  would  have  a  representation 
AP(xXwoman(x)  -  P(x)].  We  will  not  argue  for  a  particular  Logical  Form,  however,  as  this 
is  not  our  major  concern  in  this  paper  (but  see  Kamp  (1981)  and  references  therein  for 
discussion  of  logical  form  for  discourse).  Regardless  of  what  logical  formalism  is  assumed 
as  input  to  DR,  it  is  important  to  stress  that  DR  contains  structural  information  that  is 
beyond  the  scope  of  any  general,  context-independent  linguistic  formalism.  The  DR  does  not 
lose  any  information  provided  by  the  structural  properties  of  LF. 

We  now  define  the  structure  of  DR  more  completely.  A  Discourse  Representation, 
DR,  is  the  level  of  representation  of  the  utterance  derived  from  the  logico-syntactic  form, 
LF,  which  represents  the  cohesion  relationships  between  clauses,  the  domain  of  topic  and 
focus,  and  the  moves  associated  with  the  utterance.  The  cohesion  relations  (the  textual 

u  Reinhart  (1983)  addresses  some  of  the  problems  of  anaphora  between  main  clause  constituents  and 
adjunct  phrases. 
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directives)  relate  clausal  representations,  and  these  are  then  bound  to  a  particular  move.  A 
DR  may  be  associated  with  one  or  more  moves  in  the  larger  discourse  structure,  but  there 
must  be  at  least  one  move  associated  with  it. 

This  gives  us  the  following  derived  structure. 

LF  -  <0.  (m,  CFf]  •  •  IM_  CFj]  I 

This  level  is  the  structure  on  which  we  interpret: 

1.  The  bindings  between  discourse  anaphora  and  deictic  terms  and  their  antecedents;  that 
is,  the  domains  of  “topic'  and  “focus'  mentioned  above. 

2.  The  relationship  between  moves  in  the  context  of  higher  order  structures  (  i.e.  games  or 
discourse  trees,  cf.  below).  In  other  words,  how  these  individual  moves  combine  to  make  a 
story-level  or  narrative  discourse. 

From  this  structure  we  derive  a  level  that  I  will  call  Intentional  Form  (IF),  by: 

1.  Establishing  the  deep  coherence  relations  between  clausal  forms;  and 

2.  Recovering  the  speaker's  goal  associated  with  the  annotated  discourse 
representation. 

LF  *  <“  [M,  CFJ  •  •  (M<  CF;]  1  -  O' 

Two  clausal  forms  may  be  connected  by  one  of  the  following  deep  coherence  relations: 

1.  Causal 

2.  Spatial 

3.  Temporal 


4.  Definition 
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Causality  can  be  thought  of  as  a  covering  term  to  include  occasioning,  enablement, 
and  stronger  senses  of  causation.  For  now,  let  us  think  of  causation  as  a  operator  that 
limits  or  prunes  the  possible  state  space  following  an  event.  Thus,  where  b  is  temporally 
subsequent  to  a,  we  determine  the  strength  of  a  causing  b  by  examining  b  relative  to  the 
rest  of  the  state  space  generated  by  a. 

If  a  textual  directive  associates  two  clausal  forms  that  are  part  of  different  moves, 
then  this  is  termed  a  move-directive.  These  are  the  clue  words  that  signal  a  change  in  the 
discourse  space. 

To  illustrate  bow  the  above  levels  combine  to  form  a  model  for  Discourse  Analysis, 
let  us  look  at  a  sample  discourse  and  the  representations  associated  with  the  utterances. 

A.  The  economy  of  Houston,  where  most  US  oil  is  refined,  is  rapidly  declining, 

B.  Because  the  price  of  oil  is  falling. 

Assuming  an  uncontroversial  logical  representation  as  input  to  our  analysis,  the  DRs 
for  A  and  B  are  given  as  follows: 

DRa  type  statement  A 

Ex(xXeconomy(x)  of(x,H)  &  decline(x)  & 

EL  ABORATE(H  Ax(most-oil-refined(x)))] 

DR.  (.  type  support  & 

BECAUSE(m,iy(oil-price(y)  & 
falling(y)))] 

The  non  restrictive  relative  in  sentence  A  is  embedded  in  a  cohesion  relation  with  the  head 
of  the  relative.  “Houston”.  Since  "because”  relates  propositions  in  different  moves,  it  acts  as 
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a  move-directive,  and  is  analyzed  similarly  to  clue  words  in  Reichman's  approach. 

The  IF  associated  with  each  utterance  will  establish  any  coherence  relations  between 
clauses,  and  will  recover  the  speaker's  goal.  Speaking  to  the  first  point,  notice  that  the 
elaboration  in  A  will  translate  to  a  definitional  relation.  This  particular  definition  qua 
description  will  allow  the  causal  connection  expressed  by  the  move-directive,  “because”,  to 
follow  with  less  nontextual  inf  creating.  That  is,  the  connection  between  Houston's  economy 
and  oil  prices  is  facilitated  by  this  definitional  coherence. 

As  noted.  Intentional  Form  will  represent  the  goals  and  plans  associated  with  the 
utterance  as  well.  Still  the  most  elusive  aspect  of  this  level  is  the  representation  of  mutual 
belief,  the  “common  ground.”  Speaking  in  terns  of  what  is  presupposed  and  inferred  by  a 
listener,  we  will  distinguish  between: 

1.  those  clausal  forms  that  are  asserted; 

2.  those  clauses  presupposed  by  the  lexical  structure  of  an  item; 

3.  those  clauses  presupposed  on  the  basis  of  structural  configuration;  and 

4.  those  clauses  presupposed  as  a  result  of  convention. 

That  is,  presuppositions  are  triggered  by  different  elements  in  different  environments 
(Karttunen,  1973,  1974). 

Now  we  ask,  at  what  levels  are  the  various  presuppositions  derived  or  computed? 
Lexical  presuppositions,  we  claim,  accompany  the  LF  structure  into  DR;  that  is,  they  are 
already  computed.  Structural  presuppositions,  on  the  other  hand,  are  computed  from  LF  and 
feed  into  DR.  Conventional  im  plica  tu  res  will  be  read  off  of  DR  itself,  making  use  of 
information  associated  with  due  words  and  other  ■’conventional  implies  tu  re  triggers”,  while 
the  presuppositions  associated  with  '■  fiefs  and  or  .unon  ground  will  be  computed  at  IF.  IF, 
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4.  CICERO:  Inference  Controlling  for  Discourse  Analysis 

In  this  section  I  will  like  to  describe  the  current  capabilities  of  a  system  being 
designed  at  the  University  of  Massachusetts.  This  project  is  part  of  a  large  natural  language 
understanding  system,  COUNSELOR,  currently  under  development  in  our  department.  I  will 
Erst  describe  the  scope  of  the  research  involved  and  how  the  various  components  interact.  I 
will  then  give  a  detailed  description  of  the  discourse  interpreter,  CICERO,  as  well  as  the 
knowledge  representation  used  by  the  system.  At  all  times  I  hope  to  make  it  clear  how  this 
system's  functioning  relates  to  the  model  proposed  in  the  previous  section.  For  a  more 
detailed  view  of  the  current  implementation  relating  to  design  and  control  issues,  see 
Pustejovsky  et  al  (1986). 

4.1  A  Natural  Language  Interface  for  a  Case-based  Legal  Reasoning  System 

COUNSELOR  is  the  combined  efforts  of  four  separate  projects  to  develop  a 
case-based  legal  reasoning  system  with  full  natural  language  capabilities.  The  projected 
capabilities  will  allow  a  lawyer  to  interactively  input  the  facts  of  a  case,  let  the  system 
analyze  them,  and  propose  the  strongest  arguments  and  counterarguments  based  on  the 
given  facts.  The  system  that  actually  does  the  legal  reasoning  (HYPO)  is  essentially  the 
intentional  agent  for  the  natural  language  front  end,  which  consists  of  a  parser,  a 
generator,  and  a  discourse  interpreter.14  The  interaction  of  the  systems  is  illustrated  below. 


“  We  will  not  be  concerned  with  the  actual  reasoning  capabilities  of  HYPO.  Cf.  Ashley  and 
Rislmnd  (1964)  for  details  of  the  argumentation  process  involved  in  the  system. 


Lawyer 


CICERO  ^  HYPO 
Inference  jJ  Legal-Reasoning 
Controller  - —  i 


MUMBLE 

generator 


As  an  example  of  the  text  and  discourse  encountered  by  the  system,  consider  the 
fragment  below  from  an  actual  interactive  session  between  an  attorney,  P,  and  the  system. 


P:  I  represent  a  client  named  HACKINC  who  wants  to  sue  SWIPEINC  and  Leroy 
Soleil  for  misappropriating  trade  secrets  in  connection  with  software  developed  by  my 
client.  HACKINC  markets  the  software,  known  as  AUTOTELL,  a  program  to  automate 
some  of  a  bank  teller's  functions,  to  the  banking  industry. 


S:  Did  Soleil  work  for  HACKINC? 

P:  Yes,  he  did. 

S:  Did  he  then  later  work  for  SWIPEINC? 

P:  Yes. 

S:  Was  Soleil  an  employee  on  the  AUTOTELL  project? 
P:  Yes,  in  fact,  he  was  a  key  employee. 


This  example  illustrates  two  aspects  of  the  understanding  process:  (1)  fact  and  plan 
recognition  (the  opening  paragraph);  and  (2)  a  question-answer  interaction  soliciting  facts  for 
the  express  purpose  of  formulating  an  argument. 


42  Managing  the  Discourse 

The  discourse  component  of  COUNSELOR  is  a  program  called  CICERO,  which  can 
be  viewed  as  essentially  two  subsystems.  The  Erst  tracks  and  predicts  the  structure  of  a 
discourse  based  on  conversational  moves,  interpreted  through  keywords  and  a  discourse 
grammar.  The  other  subsystem  manages  and  controls  the  .indentation  of  the  deeper 
semantic  relations  between  discourse  entities  and  predicates. 

The  basic  components  of  the  system  are: 

(1)  A  Knowledge  Base  defined  in  terms  of  clustered  objects ;  and 

(2)  A  best-first  control  strategy  generating  and  recognizing  the  plans  of  the  speaker  and 
hearer,  respectively. 

A  cluster  is  a  particular  way  to  represent  both  the  objects  in  the  world  as  well  as 
mental  objects  such  as  plans  and  goals  that  operate  over  them.  It  is  similar  to  most  Frame 
Representation  Languages  with  the  associated  inheritance  properties  (cf.  Minsky  (1975), 
Bobrow  and  Winograd  (1977)). 

The  ontology  consists  of  the  following  types:17 

1.  objects:  frames  representing  real-world  objects  with  associated  role-goal  pairs. 

2.  states:  predicates  oveT  the  objects. 

3.  events:  functions  from  one  state  to  another  state. 

4.  scripts:  prototypical  event  sequences. 

Using  examples  from  the  dialogue  above,  let  us  examine  what  structure  these  clustered 
objects  have,  and  what  role  they  play  in  the  interpretation  of  the  discourse. 

17  In  this  implementation  we  usume  s  standard  temporal  logic,  such  as  Allen's  (1984)  for  interpreting 
and  reasoning  about  the  tense-based  objects  above. 


Under  the  current  implementation,11  when  the  system  begins  to  interpret  the  input 
from  the  user,  the  discourse  tracking  component  of  CICERO  has  already  set  the 
system-mode  to  expect  a  case-facts  summary  from  either  a  layman  or  an  attorney.  That  is, 
CICERO  is  expecting  a  particular  kind  of  speech  act;  namely  an  inform.  This  top-down 
expectation  is  represented  in  the  current  discourse  frame  under  the  slot  aflscoam-mode, 
along  with  the  contextual  parameters,  participants,  speaker-goal,  and  diearer-goal. 

After  the  parse  of  the  initial  sentence,  CICERO'S  task  is  to  confirm  any  expectations 
it  has  concerning  the  speaker-goal,  as  well  as  to  form  a  coherence  representation  of  the 
semantic  content  of  the  proposition.  The  parse  output  for  this  sentence  is  a 
legal-representation  frame,  and  passes  this  knowledge  to  CICERO  that  the  speaker  is  an 
attorney.  This  in  turn  satisfies  the  precondition  for  the  discourse-script  shown  in  (25)— the 
coherence  representation— and  confirms  the  system's  expectation  for  what  the  speaker's  goal 
is;  viz.  to  inform  about  a  case. 

The  script  illustrated  in  (25)  clusters  together  the  rhetorical  moves  associated  with 
presenting  information  about  a  case  for  this  particular  situation.  Each  speech  act  of  inform 
is  represented  as  a  separate  action  in  the  *  rents  field,  and  this  defines  part  of  the  larger 
textual  structure  of  this  preamble  in  the  dialogue. 


11  The  clusters  including  scripts  have  been  implemented  as  flavors  in  Zetalisp  on  a  Symbolics.  For 
implementation  details  cf.  Pustejovsky,  Gallagher,  and  Bergler  (1985). 


(25) 

(define-cluster  accept-informatioo-about-case  script 
participants  ((bearer) 

(speaker)) 

props  ((lawsuit)) 

preconditions  (  (speaker  '(type  attorney))) 

Devents  ((tO  '((optional) 

(rode  (estabtish-relationship-of-lawyer-to-party)))) 
(tl  '((iiead) 

(rode  (action-taken-by-the-plaintiff)))) 

(t2  '((dbead) 

(rode  (elaboration -of -case-perspective)))))) 


In  addition  to  the  instantiation  of  the  discourse  script  above,  the  semantic 
representation  of  the  "desire  to  sue”,  the  lawsuit  frame  from  the  parser,  is  bound  as  the 
value  of  the  xonceptnal-frame  for  this  discourse  space,  and  in  particular,  it  is  of  type 
misappropriation.  The  state  of  the  discourse  at  this  point  (after  the  first  sentence)  is 
represented  by  the  following  discourse-frame  and  bindings: 


(26) 

(define-cluster  legal-discourse-frame  discourse-frame 
participants  ((hearer  'COUNSELOR) 

(speaker  '((type  attorney) 

(infer  from  legal-rep  attorney)))) 

:h  carer-goal  0 

speaker-goal  ((inform  legal-rep)) 
discourse-mode  ((mode  'expect-inform)) 
discourse-script  ((script  'accept-information-about-case  script)) 
xooceptual-frame  ((lawsuit  '(type  {misappropriation)))  ) 


At  this  point  the  system  operates  in  a  top-down  expectation-driven  mode,  triggered  by 
the  value  for  the  conceptual-frame  slot.  That  is,  {misappropriation  is  itself  a  script,  and 
the  best-first  control  strategy  used  by  CICERO  chooses  to  instantiate  the  script  as  part  of 
its  inferendng  about  the  coherence  relations  In  the  (upcoming)  text. 
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(27) 


(defme-clustcr  Smisappropriate  script 
"legal  concept” 
participants 

((plain  tiff -corporation  '((type  corporation) 

(inherit  thru  parent  lawsuit  *))) 
(defendant-corporation  '((type  corporation) 

(inherit  thru  parent  lawsuit  •))) 


) 


props 


((plaintiff-product  '((type  product) 

(infer  from  plaintiff-corporation  product))) 
(defendant-product  '((type  product) 

(infer  from  defendant-corporation  product))) 
(misappropriated- knowledge  '((type  knowledge-about-a-product)))) 
preconditions  ((tO  '((rode  (produces  plaintiff-corporation 

plaintiff-product)))) 

(tl  '((rode  '(used-in  plaintiff-product 

misappropriated-knowledge))))) 
events  ((t2  '((rode  Sillegitimate-access-to- knowledge))) 

(t3  '((rode  (equal  misappropriated-knowledge 

(get-value  defendant-product  rknowledge-used)) 


) 


(t4  '((rode  Scompetitive-advantage)))) 


))) 


This  representation  provides  us  with  the  logical  arguments  to  a  relation  (the 
entailments),  as  well  as  a  large  set  of  presuppositions  that  will  direct  the  inferendng— to 
establish  the  deep  coherence —  in  later  processing. 

Notice  that  the  discourse  frame  in  (27)  keeps  a  dual  representation  of  the  information 
streaming  in  from  the  parser.  For  structural  bookkeeping  purposes,  the  tmisappropriato 
frame  is  bound  to  actlon-taken-by-the-plalntiff,  in  that  it  satisfies  a  particular  structural 
property  of  such  preamble  paragraphs.  For  deeper  semantic  coherence,  however,  the  same 
frame  is  bound  to  dype  of  a  lawsuit,  and  carries  the  complex  of  information  shown  above 
in  (27). 


% 
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There  are  two  interesting  aspects  to  the  representation  shown  in  (27): 


1.  Any  inferences  pooible  due  to  the  presupposition-set  of  an  utterance  are  computed  by 
CICERO  rather  than  the  expert  system. 

2.  The  exact  same  representation  is  used  for  understanding  text  as  for  generating  text. 

SD  CoodosfcM 

I  have  sketched  in  this  paper  a  very  rough  model  of  discourse  analysis  based  on  a 
level  hypothesis,  wherein  the  conflating  factors  of  discourse  interpretation  have  been  teased 
apart.  In  the  previous  section  I  attempted  to  demonstrate  a  working  system,  CICERO, 
which  is  “aware”  of  these  levels  at  the  stages  of  analysis  outlined  above.  The  system, 
howver,  is  still  incomplete  at  this  point,  in  that  it  fails  to  adequately  simulate  and  model 

the  speaker's  belief  space.  Furthermore,  the  role  of  goal  recognition  as  recovering  the 

/ 

speaker's  intention  was  minimal,  due  to  the  nature  of  the  interaction  in  the  domain.  These 
topics  are  being  addressed  currently  in  our  ongoing  research. 
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